PAD Performance Measures: We've Come a Long Way, but the Journey Is Not Over

Last Updated: February 28, 2022


Disclosure: None.
Pub Date: Monday, Nov 29, 2010
Author: Elizabeth Delong, PhD
Affiliation:

Medical costs are spiraling out of control; the United States cost of care is one of the highest among developed countries while some of our outcomes are embarrassingly poor. Unexplained discrepancies in both access and care have created a mandate for reform of the entire system. Health care payers are now demanding accountability and are implementing mechanisms for rewarding or penalizing performance. Meanwhile, medical professional societies are attempting to bring some order to the chaos by standardizing care through the creation of performance measures that are based on scientific evidence.

In an effort to translate the benefits of science into practice, expert panels have come together to assess and synthesize the results of clinical studies into treatment guidelines that represent the optimal treatment for the average patient. Adherence to these guidelines has been shown to improve outcomes in several areas, prompting the advancement of performance metrics that correspond to these optimal treatment recommendations. The creation of performance measures in a specific disease area is a major undertaking, requiring not only expertise and experience in the area, but also perspective, judgment, and foresight. The PAD Writing Committee has done a thorough and meticulous job of selecting and specifying such measures by first outlining and adhering to principles in the Performance Measure Survey Form and Exclusion Criteria Definitions. After accumulating a collection of guidelines and evidence, they formally considered and specified a large number of potential guideline-associated measures. Specification involved details such as the designation of target population, denominator and numerator, potential exclusions, and time frame for applicability.

Each of the specified measures was next reviewed with regard to 1) availability and strength of supporting evidence, 2) whether it was clinically meaningful, 3) whether it would be both actionable and feasible to implement, and 4) whether it was reliable across settings. This process resulted in 37 fully specified measures, among which only seven were put forth as true performance measures and an additional two were included as test measures. However, the selection and specification process does not complete the task. As always, "the devil is in the details," and, in this case, the devil is in the implementation and interpretation. Implementation will necessarily involve standardized protocols and coding algorithms across settings; interpretation will need to rely on reliable implementation, and will also involve both analysis and attribution.

Details regarding the consistency of exact coding across settings have the potential to undermine criteria such as the requirement of reliability across settings. A good example is given by the specification of the Ankle Brachial Index (ABI) measure. The Method of Reporting for this measure, which applies to eligible patients who are designated according to age and risk, is "Whether an ABI was performed at least once in the last 5 years." One of the age categories included in the denominator consists of patients who are between 50 and 69 years of age and have a history of smoking or diabetes. How will a 5-year data snapshot classify the 52-year-old patient who is included in the denominator, but has not yet had an ABI measured? How about the 53 year old who develops diabetes at age 52? Will only patients who have been eligible for a full 5 years be included in the denominator?

Analysis and attribution are implicitly related and extremely important in light of current efforts to rank and reward performance based on benchmarks established by these metrics. The naive numerator/denominator approach to performance measurement seems clear and understandable. However, as indicated in the "Challenges to Implementation," sample sizes may preclude using the individual clinician as a denominator. Accumulating the denominator over all patients in a particular clinical setting ignores the clustering of patients within lower level entities such as clinicians or practices and runs the risk of masking the true performance. Consider, for example, the setting in which 10 practitioners have individual performance rates ranging between 60% and 95%, with varying sample sizes and an overall rate of 84%. Ignoring the variability among these measures will produce a deceptive level of confidence. However, taking the average performance of each of the lower level entities ignores potential variability in sample sizes. Statistical considerations for such clustered data need to be employed.

Another issue regarding the analysis is the opportunity for confounding to camouflage any comparisons being made. The following example of confounding due to age group (Table 1, paraphrased from the Wikipedia site http://en.wikipedia.org/wiki/Simpson%27s_paradox) clearly illustrates the danger. Suppose it is more likely for the younger group of patients to be sent for the ABI test and that most of Hospital B's patients come from this group.

In this example, Hospital A performs better than Hospital B in both age groups, but overall Hospital B looks better if stratification by age groups is not implemented.

At the end of the process are the interpretation and attribution. One of the performance measures put forth under the categories of Patient Education, Treatment, and Self-Management/Compliance is smoking cessation. The measure is counted if tobacco users have received cessation intervention, which may include smoking-cessation counseling (e.g., verbal advice to quit, referral to a smoking-cessation program or counselor) and/or pharmacologic therapy. The type of intervention should be explicitly captured. However, once captured, how will this measure be interpreted? Will all forms of "cessation intervention" from verbal advice to quit to pharmacologic therapy be counted the same? Will there be more credit given for higher levels of intervention? And what about the patient who is seen multiple times during the 2-year measurement period; what is that patient's "per patient" measure if pharmacologic therapy is given the first time, but the patient fails to comply? To get full credit, especially in a situation for which pharmacologic therapy provides more credit, does the physician continue to prescribe the therapy? Will there be insurance barriers? This latter point leads to the attribution aspect. For how much is the clinician responsible when a patient doesn't comply with a recommendation or test? In a similar situation, is the high school teacher responsible for a student's performance if the student isn't getting the right amount of sleep or appropriate nutrition? The physician can no more follow the patient home to ensure compliance than the teacher can follow the student home. At some point, human behavior will play a role.

And now we come to the most dangerous aspects of performance measurement, which are the unintended consequences. The statin recommendation for almost all PAD patients presents a good example. First, if prescriptions need to be filled in order to count, there is a risk that patients who are likely to be compliant, at least to the point of filling the prescription, will be preferentially treated. More generally, when the guidelines for a specific condition become onerous, there are two potentially negative side effects. First, clinicians may subconsciously avoid patients with the condition. More likely, they will be devoting so much of their time to completing the checklist that they fail to notice and treat a more serious problem.

There is no denying that medical care is complex. Each presenting patient is an individual, resulting from a nearly infinite combination of genes and environment. To appropriately diagnose and treat that individual can be considered an art, relying on both science and intuition. The appropriate balance between science and intuition is a matter of concern. The extent to which guidelines rely on credible evidence, performance measures are clear and analyzable, and attribution is unquestionable, the scale should certainly tip in the direction of science. In those cases where ambiguity creates a clouded picture, the value of intuition should not be discounted. As Einstein wisely noted, "Not everything that can be counted counts, and not everything that counts can be counted."

Science News Commentaries

View All Science News Commentaries

-- The opinions expressed in this commentary are not necessarily those of the editors or of the American Heart Association --